Reading Comprehension

Overview

This directory contains an implementation of the boundary model(b in the Figure) Match LSTM and Answer Pointer network for Machine Reading Comprehension. The idea behind this method is to build a question aware representation of the passage and use this representation as an input to the pointer network which identifies the start and end indices of the answer.

Model Architecture

_images/MatchLSTM_Model.png

Files

  • examples/reading_comprehension/train.py -Implements the end to end model along with the training commands
  • examples/reading_comprehension/utils.py- Implements different utility functions to set up the data loader and for evaluation.
  • examples/reading_comprehension/prepare_data.py- Implements the pipeline to preprocess the dataset
  • nlp_architect/models/matchlstm_ansptr.py- Defines the end to end MatchLSTM and Answer_Pointer network for Reading Comprehension

Dataset

This repository uses the SQuAD dataset. The preprocessing steps required prior to training are listed below:

  1. cd examples/reading_comprehension/; mkdir data

2. Download the official SQuAD-v1.1 training (train-v1.1.json) and development(dev-v1.1.json) datasets from here and place the extracted json files in the data directory. For more information about SQuAD, please visit https://rajpurkar.github.io/SQuAD-explorer/. The terms and conditions of the data set license apply. Intel does not grant any rights to the data files. 3. Download the GloVe pretrained embeddings from http://nlp.stanford.edu/data/glove.6B.zip and copy glove.6B.300d.txt file into the data directory. For more information about GloVe please visit https://nlp.stanford.edu/projects/glove/. The terms and conditions of the data set license apply. Intel does not grant any rights to the data files. 4. Preprocess the data set using the following command:

python examples/reading_comprehension/prepare_data.py --data_path data/

Running Modalities

Training & Inference

Train the model using the following command:

python examples/reading_comprehension/train.py --data_path data/

To visualize predicted answers for paragraphs and questions in the validation dataset (ie run inference with batch_size=1) use the following command:

python train.py --restore_model=True --inference_mode=True --data_path=data/ --model_dir=/path/to/trained_model/ --batch_size=1 --num_examples=50

The command line options available are:

--data_path enter the path to the preprocessed dataset
--max_para_req enter the max length of the paragraph to truncate the dataset. Default is 300.
--epochs enter number of epochs to start training. Default is 15.
--gpu_id select the gpu id train the model. Default is 0.
--train_set_size
 enter the size of the training set. Default takes in all examples for training.
--batch_size enter the batch size. Default is 64.
--hidden_size enter the number of hidden units. Default is 150.
--model_dir enter the path to save/load model.
--select_device
 select the device to run training (CPU, GPU etc)
--restore_model
 choose whether to restore training from a previously saved model. Default is False.
--inference_mode
 choose whether to run inference only
--num_examples enter the number of examples to run inference. Default is 50.

Results

After training starts, you will see outputs similar to this:

Loading Embeddings
creating training and development sets
Match LSTM Pass
Answer Pointer Pass
Setting up Loss
Set up optimizer
Begin Training
Epoch Number:  0
iteration = 1, train loss = 13.156427383422852
F1_Score and EM_score are 0.0 0.0
iteration = 21, train loss = 12.441322326660156
F1_Score and EM_score are 8.333333333333332 0.0
iteration = 41, train loss = 10.773386001586914
F1_Score and EM_score are 6.25 6.25
iteration = 61, train loss = 11.69123649597168
F1_Score and EM_score are 6.25 6.25

Please note that after each epoch you will see the validation F1 and EM scores being printed out. These numbers are a result of a much stricter evaluation and lower than the official evaluation numbers.

Considering the default setting, which has training set of 85387 examples and a development set of 10130 examples after 15 epochs, you should expect to see a F1 and EM scores on the development set similar to this:

F1 Score:~62%
EM Score:~48%

References

[1]SQuAD: 100,000+ Questions for Machine Comprehension of Text. Authors: Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. Subjects: Computation and Language(cs.CL). arXiv:1606.05250 [cs.CL][https://arxiv.org/abs/1606.05250]. License: https://creativecommons.org/licenses/by-sa/4.0/legalcode
[2]Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014 https://nlp.stanford.edu/pubs/glove.pdf. License: http://www.opendatacommons.org/licenses/pddl/1.0/
[3]Wang, S., & Jiang, J. (2016). Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905. [https://arxiv.org/abs/1608.07905]